在对肺癌患者的放疗治疗期间,需要最小化肿瘤周围健康组织的辐射,这由于呼吸运动和线性加速器系统的潜伏期很难。在拟议的研究中,我们首先使用Lucas-Kanade锥体光流算法来对四个肺癌患者的胸部计算机断层扫描图像进行可变形的图像登记。然后,我们根据先前计算的变形场跟踪靠近肺部肿瘤的三个内部点,并通过使用实时重复学习(RTRL)和梯度剪辑训练的复发神经网络(RNN)预测其位置。呼吸数据非常规规律,在约2.5Hz时采样,并在脊柱方向上包括人工漂移。轨道点的运动幅度范围为12.0mm至22.7mm。最后,我们提出了一种基于线性对应模型和Nadaraya-Watson非线性回归的最初肿瘤图像的恢复和预测3D肿瘤图像的简单方法。与测试集上RNN预测相对应的根平方误差,最大误差和抖动小于使用线性预测和最小平方(LMS)获得的相同性能度量。特别是,与RNN相关的最大预测误差等于1.51mm,比与线性预测和LMS相关的最大误差低16.1%和5.0%。 RTRL的平均预测时间等于119ms,小于400ms标记位置采样时间。预测图像中的肿瘤位置在视觉上似乎是正确的,这通过等于0.955的原始图像和预测图像之间的高平均互相关证实。
translated by 谷歌翻译
在肺放疗期间,可以记录红外反射物体的位置以估计肿瘤位置。但是,放射治疗系统具有阻碍辐射递送精度的机器人控制限制固有的延迟。通过在线学习复发性神经网络(RNN)的预测允许适应非平稳的呼吸信号,但是诸如RTRL和TRUNCED BPTT之类的经典方法分别缓慢且有偏见。这项研究调查了公正的在线复发优化(UORO)预测呼吸运动的能力,并提高肺放疗的安全性。我们使用了9个观察记录,记录了3D外部标记在胸部和健康个体的腹部的3D位置,从73至222s的间隔内呼吸。采样频率为10Hz,在上部方向上,记录的轨迹的幅度从6mm到40mm不等。我们使用经过UORO训练的RNN同时预测每个标记的3D位置,其地平值在0.1s和2.0之间。我们将其性能与经过RTRL,LMS和离线线性回归训练的RNN进行比较。我们为UORO中涉及梯度损失计算的数量提供了封闭形式的表达式,从而使其实施有效。在每个序列的第一分钟内进行训练和交叉验证。在考虑的地平线值和9个序列上,Uoro平均达到了比较算法之间最低的根平方(RMS)误差和最大误差。这些误差分别等于1.3mm和8.8mm,每时间步长的预测时间低于2.8ms(Dell Intel Core i9-9900K 3.60 GHz)。线性回归的Horizo​​n值为0.1和0.2s的RMS误差最低,其次是0.3s和0.5s之间的LMS,而LMS的LMS误差为0.3s和0.5s,而Uoro的地平线值大于0.6s。
translated by 谷歌翻译
Human civilization has an increasingly powerful influence on the earth system. Affected by climate change and land-use change, natural disasters such as flooding have been increasing in recent years. Earth observations are an invaluable source for assessing and mitigating negative impacts. Detecting changes from Earth observation data is one way to monitor the possible impact. Effective and reliable Change Detection (CD) methods can help in identifying the risk of disaster events at an early stage. In this work, we propose a novel unsupervised CD method on time series Synthetic Aperture Radar~(SAR) data. Our proposed method is a probabilistic model trained with unsupervised learning techniques, reconstruction, and contrastive learning. The change map is generated with the help of the distribution difference between pre-incident and post-incident data. Our proposed CD model is evaluated on flood detection data. We verified the efficacy of our model on 8 different flood sites, including three recent flood events from Copernicus Emergency Management Services and six from the Sen1Floods11 dataset. Our proposed model achieved an average of 64.53\% Intersection Over Union(IoU) value and 75.43\% F1 score. Our achieved IoU score is approximately 6-27\% and F1 score is approximately 7-22\% better than the compared unsupervised and supervised existing CD methods. The results and extensive discussion presented in the study show the effectiveness of the proposed unsupervised CD method.
translated by 谷歌翻译
视频语义细分(VSS)的本质是如何利用时间信息进行预测。先前的努力主要致力于开发新技术来计算诸如光学流和注意力之类的跨框架亲和力。取而代之的是,本文通过跨框架亲和力之间的采矿关系从不同的角度做出了贡献,可以在其上实现更好的时间信息聚合。我们在两个方面探索亲和力之间的关系:单尺度的内在相关性和多尺度关系。受传统功能处理的启发,我们提出了单尺度亲和力改进(SAR)和多尺度亲和力聚合(MAA)。为了使执行MAA可行,我们提出了一种选择性令牌掩蔽(STM)策略,以在计算亲和力时为不同量表选择一致参考令牌的子集,这也提高了我们方法的效率。最后,采用了SAR和MAA加强的跨框架亲和力,以自适应地汇总时间信息。我们的实验表明,所提出的方法对最新的VSS方法表现出色。该代码可在https://github.com/guoleisun/vss-mrcfa上公开获取
translated by 谷歌翻译
Implicit fields have been very effective to represent and learn 3D shapes accurately. Signed distance fields and occupancy fields are the preferred representations, both with well-studied properties, despite their restriction to closed surfaces. Several other variations and training principles have been proposed with the goal to represent all classes of shapes. In this paper, we develop a novel and yet fundamental representation by considering the unit vector field defined on 3D space: at each point in $\mathbb{R}^3$ the vector points to the closest point on the surface. We theoretically demonstrate that this vector field can be easily transformed to surface density by applying the vector field divergence. Unlike other standard representations, it directly encodes an important physical property of the surface, which is the surface normal. We further show the advantages of our vector field representation, specifically in learning general (open, closed, or multi-layered) surfaces as well as piecewise planar surfaces. We compare our method on several datasets including ShapeNet where the proposed new neural implicit field shows superior accuracy in representing any type of shape, outperforming other standard methods. The code will be released at https://github.com/edomel/ImplicitVF
translated by 谷歌翻译
边界是人类和计算机视觉系统使用的主要视觉提示之一。边界检测的关键问题之一是标签表示,这通常会导致类不平衡,因此,较厚的边界需要稀疏的非差异后处理步骤。在本文中,我们将边界重新解释为1D表面,并制定一对一的向量变换功能,允许训练边界预测完全避免了类不平衡问题。具体而言,我们在任何点定义边界表示,因为单位向量指向最接近的边界表面。我们的问题表述可导致方向的估计以及边界的更丰富的上下文信息,如果需要,在训练时也可以使用零像素薄边界。我们的方法在训练损失中不使用超参数和推断时固定的稳定的高参数。我们提供有关向量变换表示的理论理由/讨论。我们使用标准体系结构评估了提出的损失方法,并显示了几个数据集上其他损失和表示的出色性能。代码可在https://github.com/edomel/boundaryvt上找到。
translated by 谷歌翻译
Efficient detection and description of geometric regions in images is a prerequisite in visual systems for localization and mapping. Such systems still rely on traditional hand-crafted methods for efficient generation of lightweight descriptors, a common limitation of the more powerful neural network models that come with high compute and specific hardware requirements. In this paper, we focus on the adaptations required by detection and description neural networks to enable their use in computationally limited platforms such as robots, mobile, and augmented reality devices. To that end, we investigate and adapt network quantization techniques to accelerate inference and enable its use on compute limited platforms. In addition, we revisit common practices in descriptor quantization and propose the use of a binary descriptor normalization layer, enabling the generation of distinctive binary descriptors with a constant number of ones. ZippyPoint, our efficient quantized network with binary descriptors, improves the network runtime speed, the descriptor matching speed, and the 3D model size, by at least an order of magnitude when compared to full-precision counterparts. These improvements come at a minor performance degradation as evaluated on the tasks of homography estimation, visual localization, and map-free visual relocalization. Code and trained models will be released upon acceptance.
translated by 谷歌翻译
传统的域自适应语义细分解决了在有限或没有其他监督下,将模型调整为新的目标域的任务。在解决输入域间隙的同时,标准域的适应设置假设输出空间没有域的变化。在语义预测任务中,通常根据不同的语义分类法标记不同的数据集。在许多现实世界中,目标域任务需要与源域施加的分类法不同。因此,我们介绍了更通用的自适应跨域语义细分(TAC)问题,从而使两个域之间的分类学不一致。我们进一步提出了一种共同解决图像级和标签级域适应的方法。在标签级别上,我们采用双边混合采样策略来增强目标域,并采用重新标记方法来统一和对齐标签空间。我们通过提出一种不确定性构造的对比度学习方法来解决图像级域间隙,从而导致更多的域不变和类别的歧义特征。我们在不同的TACS设置下广泛评估了框架的有效性:开放分类法,粗到精细的分类学和隐式重叠的分类学。我们的方法的表现超过了先前的最先进的利润,同时能够适应目标分类法。我们的实施可在https://github.com/ethruigong/tada上公开获得。
translated by 谷歌翻译
本文解决了由多头自我注意力(MHSA)中高计算/空间复杂性引起的视觉变压器的低效率缺陷。为此,我们提出了层次MHSA(H-MHSA),其表示以层次方式计算。具体而言,我们首先将输入图像分为通常完成的补丁,每个补丁都被视为令牌。然后,拟议的H-MHSA学习本地贴片中的令牌关系,作为局部关系建模。然后,将小贴片合并为较大的贴片,H-MHSA对少量合并令牌的全局依赖性建模。最后,汇总了本地和全球专注的功能,以获得具有强大表示能力的功能。由于我们仅在每个步骤中计算有限数量的令牌的注意力,因此大大减少了计算负载。因此,H-MHSA可以在不牺牲细粒度信息的情况下有效地模拟令牌之间的全局关系。使用H-MHSA模块合并,我们建立了一个基于层次的变压器网络的家族,即HAT-NET。为了证明在场景理解中HAT-NET的优越性,我们就基本视觉任务进行了广泛的实验,包括图像分类,语义分割,对象检测和实例细分。因此,HAT-NET为视觉变压器提供了新的视角。可以在https://github.com/yun-liu/hat-net上获得代码和预估计的模型。
translated by 谷歌翻译